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[57] ABSTRACT 

A systeno for coding of digital video images such as 
bi-directionally predicted video object planes (B-VOPs), in 
particular, where the B-VOP and/or a reference image used 
to code the B-VOP is interlaced coded. For a B-VOP 
macroblock which is co-sited with a field predicted mac- 
roblock of a future anchor picture, direct mode prediction is 
made by calculating four field motion vectors, then gener- 
ating the prediction macroblock. The four field motion 
vectors and their reference fields are determined from (1) an 
ofi&et term of the current macroblock's coding vector, (2) the 
two future anchor picture field motion vectors, (3) the 
reference field used by the two field motion vectors of the 
co-sited future anchor macroblock, and (4) the temporal 
spacing, in field periods, between the current B-VOP fields 
and the anchor fields. Additionally, a coding mode decision 
process for the current MB selects a forward, backward, or 
average field coding mode according to a minimum sum of 
absolute differences (SAD) error which is obtained over the 
top and bottom fields of the current MB. 

27 Claims, 8 Drawing Sheets 
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PREDICTION AND CODING OF BI- 
DIRECTIONALLY PREDICTED VIDEO 
OBJECT PLANES FOR INTERLACED 
DIGITAL VIDEO 

This application claims the benefit of U.S. Provisional 
AppUcation No. 60/040,120, filed Mar. 7, 1997, and U.S. 
Provisional Application No. 60/042,245, filed Mar. 31, 
1997. 

BACKGROUND OF THE INVENTION 

The present invention provides a method and apparatus 
for coding of digital video images such as bi-directionally 
predicted video object planes (B-VOPs), in particular, where 
the B-VOP and/or a reference image used to code the 
B-VOP is interlaced coded. 

The invention is particularly suitable for use with various 
multimedia applications, and is compatible with the 
MPEG-4 Verification Model (VM) 8.0 standard (MPEG-4 
VM 8.0) described in document ISO/IEC/JTC1/SC29/ 
WGli N1796, entitled "MPEG-4 Video Verification Model 
Version 8.01", Stockholm, July 1997, incorporated herein by 
reference. The MPEG-2 standard is a precursor to the 
MPEG-4 standard, and is described in document ISO/IEC 
13818-2, entitled "Information Technology — Generic Cod- 
ing of Moving Pictures and Associated Audio, Recommen- 
dation H.262," Mar. 25, 1994, incorporated herein by ref- 
erence, 

MPEG-4 is a coding standard which provides a flexible 
framework and an open set of coding tools for 
communication, access, and manipulation of digital audio- 
visual data. These tools support a wide range of features. 
The flexible framework of MPEG-4 supports various com- 
binations of coding tools and their corresponding function- 
alities for applications required by the computer, 
telecommunication, and entertainment (i.e., TV and film) 
industries, such as database browsing, information retrieval, 
and interactive communications. 

MPEG-4 provides standardized core technologies allow- 
ing efficient storage, transmission and manipulation of video 
data in multimedia environments. MPEG-4 achieves effi- 
cient compression, object scalability, spatial and temporal 
scalability, and error resilience. 

The MPEG-4 video VM coder/decoder (codec) is a block- 
and object-based hybrid coder with motion compensation. 
Texture is encoded with an 8x8 Discrete Cosine Transfor- 
mation (DCT) utilizing overlapped block-motion compen- 
sation. Object shapes are represented as alpha maps and 
encoded using a Content-based Arithmetic Encoding (CAE) 
algorithm or a modified DCT coder, both using temporal 
prediction. The coder can handle sprites as they are known 
from computer graphics. Other coding methods, such as 
wavelet and sprite coding, may also be used for special 
applications. 

Motion compensated texture coding is a well known 
approach for video coding, and can be modeled as a three- 
stage process. TTie first stage is signal processing which 
includes motion estimation and compensation (ME/MC) and 
a two-dimensional (2-D) spatial transformation. The objec- 
tive of ME/MC and the spatial transformation is to take 
advantage of temporal and spatial correlations in a video 
sequence to optimize the rate-distortion performance of 
quantization and entropy coding under a complexity con- 
straint. The most common technique for ME/MC has been 
block matching, and the most common spatial transforma- 
tion has been the DCT. 
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However, special concerns arise for ME/MC of macrob- 
locks (MBs) in B-VOPs when the MB is itself interlaced 
coded and/or uses reference images which are interlaced 
coded 

5 In particular, it would be desirable to have an efficient 
technique for providing motion vector (MV) predictors for 
a MB in a B-VOP. It would also be desirable to have an 
efficient technique for direct mode coding of a field coded 
MB in a B-VOR It would further be desirable to have a 
coding mode decision process for a MB in a field coded 
B-VOP for selecting the reference image which is results in 
the most efficient coding. 

The present invention provides a system having the above 
and other advantages. 

15 SUMMARY OF THE INVENTION 

In accordance with the present invention, a method and 
apparatus are presented for coding of digital video images 
such as a current image (e.g., macroblock) in a 
bi-directionally predicted video object plane (B-VOP), in 
20 particular, where the current image and/or a reference image 
used to code the current image is interlaced (e.g., field) 
coded. 

In a first aspect of the invention, a method provides direct 
mode motion vectors (MVs) for a current bi-directionaUy 

25 predicted, field coded image such as a macroblock (ME) 
having top and bottom fields, in a sequence of digital video 
images. A past field coded reference image having top and 
bottom fields, and a future field coded reference image 
having top and bottom fields are determiaed. The future 
image is predicted using the past image such that MV^„^, a 
forward MV of the top field of the future image, references 
either the top or bottom field of said past image. The field 
which is referenced contains a best-match MB for a MB in 
the top field of the future image. 

35 This MV is termed a "forward" MV since, although it 
references a past image (e.g., backward in time), the pre- 
diction is from the past image to the future image, e.g., 
forward in time. As a mnemonic, the prediction direction 
may be thought of as being opposite the direction of the 
corresponding MV. 

Similarly, MV^^„ a forward motion vector of the bottom 
field of the future image, references either the top or bottom 
field of the past image. Forward and backward MVs are 
determined for predicting the top and/or bottom fields of the 

45 current image by scaling the forward MV of the correspond- 
ing field of the future image. 

In particular, MVyr^^, the forward motion vector for 
predicting the top field of the current image, is determined 
according to the expression M Vy ,^^=(M V^^^ 

50 *T^B,top)f^n,rop+^"^D, where MV^ is a' delta motion 
vector for a search area, TR^^^^ corresponds to a temporal 
spacing between the top field of the current image and the 
field of the past image which is referenced by MV^^^, and 
TRz?,fo/» corresponds to a temporal spacing between the top 

55 field of the future image and the field of the past image 
which is referenced by MV^^^. The temporal spacing may be 
related to a firame rate at which the images are displayed. 

Similarly, MV^^,^^ the forward motion vector for predict- 
ing the bottom field of the current image, is determined 

60 according to the expression M V^ fc^,=(MVj,^, 
*TR5,toi)/TR^^£,of+MV^, where MV^ is a' delta motion 
vector, TRg fy^f corresponds to a temporal spacing between 
the bottom field of the current image and the field of the past 
image which is referenced by MV^^^ and TR^,^ corre- 

65 sponds to a temporal spacing between the bottom field of the 
futiu-e MB and the field of the past MB which is referenced 

byMV^ 
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MV^,^^, the backward motion vector for predicting the FIG. 8 illustrates a macroblock layer structure in accor- 

top field of the current MB is determined according to the dance with the present invention, 

equation MV,,,.^-((TR^,,.^-TR^...^)*M V,,^)/TR^,,,^ r.pc:roTPTfnM hp tup 

when the delta motion vector MV^=0, or MV^ r.^^MV^^^^- DETAILED DESCRIPTION OF THE 

MVfo^ when MV^P^O. ' ' 5 INVENTION 

MVfa the backward motion vector for predicting the a method and apparatus are presented for coding of a 

bottom field of the current MB is determined according to digital video image such as a macroblock (MB) in a 

the equation UVi,tar'i(J^B,boi-'^D,bar)*^^bJf^n.bo: bi-directionally predicted video object plane (B-VOP), in 

when the delta motion vector MV^=0, or M V^,^^=M V^^,- particular, where the MB and/or a reference image used to 
MVj^ when MV^^O. lo code the MB is interlaced coded. The scheme provides a 

A corresponding decoder is also presented. method for selecting a prediction motion vector (PMV) for 

In another aspect of the invention, a method is presented the top and bottom field of a field coded current MB, 

for selecting a coding mode for a current predicted, field including forward and backward PMVs as required, as well 

coded MB having top and bottom fields, in a sequence of as for frame coded MBs. A direct coding mode for a field 

digital video MBs. The coding mode may be a backward coded MB is also presented, in addition to a coding decision 

mode, where the reference MB is temporally after the process which uses the minimum of sum of absolute differ- 

current MB in display order, a forward mode, where the ences terms to select an optimum mode, 

reference MB is before the current MB, or average (e.g., FIG. 1 is an illustration of a video object plane (VOP) 

bi-directional) mode, where an average of prior and subse- coding and decoding process in accordance with the present 
quent reference MBs is used. 20 invention. Frame 105 includes three pictorial elements, 

The method includes the step of determining a forward including a square foreground element 107, an oblong 

sum of absolute differences error, SAD^^^^^^,^ for the foreground element 108, and a landscape backdrop element 

current MB relative to a past reference MB, which corre- 109. In frame 115, the elements are designated VOPs using 

sponds to a forward coding mode. SADy^^^^^^,j indicates a segmentation mask such that VOP 117 represents the 

the error in pixel luminance values between the current MB square foreground element 107, VOP 118 represents the 

and a best match MB in the past reference MB. A backward oblong foreground element 108, and VOP 119 represents the 

sum of absolute differences error, SAD^^^j^^^^,^ for the landscape backdrop element 109. A VOP can have an 

current MB relative to a future reference MB, which corre- arbitrary shape, and a succession of VOPs is known as a 

sponds to a backward coding mode is also determined. video object. A full rectangular video frame may also be 

SAD^^jtH«^j2ff/d indicates ±e error in pixel luminance val- considered to be a VOP. Thus, the term "VOP" will be used 

ues between the current MB and a best match MB in the herein to indicate both arbitrary and non -arbitrary (e.g., 

futiu'e reference MB. rectangular) image area shapes. A segmentation mask is 

An average sum of absolute differences error, obtained using known techniques, and has a format similar 

SAD^^g^^jjg^^ for the current MB relative to an average of to that of ITU-R 601 luminance data. Each pixel is identified 

the past and ftiture reference MBs, which corresponds to an as belonging to a certain region in the video frame, 

average coding mode, is also determined. SAD„^^„^^^^;^ The frame 105 and VOP data from frame 115 are supplied 

indicates the error in pixel luminance values between the to separate encoding functions. In particular, VOPs 117, 118 

current MB and a MB which is the average of the best match and 119 undergo shape, motion and texture encoding at 

MBs of the past and future reference MBs. encoders 137, 138 and 139, respectively. With shape coding, 

The coding mode is selected according to the minimum of binary and gray scale shape information is encoded. With 

the SADs. Bias terms which account for the number of motion coding, the shape information is coded using motion 

required MVs of the respective coding modes may also be estimation within a frame. With texture coding, a spatial 

factored into the coding mode selection process. transformation such as the DCT is performed to obtain 

SAD^.^,,^^,;^, SADfc,,^,,^^,,^ and SAD,^,,^,^,,^ are ^5 transform coefficients which can be variable-length coded 

determined by summing the component terms over the top for compression. 

and bottom fields. The coded VOP data is then combined at a multiplexer 

BRIEF DESCRIPTION OF THE DRAWINGS ^^^^ ^"^^ transmission over a channel 145. 

Alternatively, the data may be stored on a recording 

FIG. 1 is an illustration of a video object plane (VOP) medium. The received coded VOP data is separated by a 

coding and decoding process in accordance with the present demultiplexer (DEMUX) 150 so that the separate VOPs 

invention. 117-119 are decoded and recovered. Frames 155, 165 and 

¥IG. 2 is a block diagram of an encoder in accordance 175 show that VOPs 117, 118 and 119, respectively, have 

with the present invention. b^en decoded and recovered and can therefore be individu- 

FIG. 3 illustrates an interpolation scheme for a half-pixel ally manipulated using a compositor 160 which interfaces 

search. with a video library 170, for example, 

FIG. 4 illustrates direct mode coding of the top field of an The compositor may be a device such as a personal 

interlaced-coded B-VOP in accordance with the present computer which is located at a user's home to allow the user 

mvention. to edit the received data to provide a customized image. For 

FIG. 5 illustrates direct mode coding of the bottom field 50 example, the user's personal video library 170 may include 

of an interlaced-coded B-VOP in accordance with the a previously stored VOP 178 (e.g., a circle) which is 

present invention. different than the received VOPs. The user may compose a 

FIG. 6 illustrates reordering of pixel lines in an adaptive frame 185 where the circular VOP 178 replaces the square 

frame/field prediction scheme in accordance with the present VOP 117, The frame 185 thus includes the received VOPs 
invention. 65 118 and 119 and the locally stored VOP 178. 

FIG. 7 is a block diagram of a decoder in accordance with In another example, the background VOP 109 may be 

the present invention. replaced by a background of the user's choosing. For 
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example, when viewing a television news broadcast, the 
announcer may be coded as a VOP which is separate from 
the background, such as a news studio. The user may select 
a background from the library 170 or from another television 
program, such as a channel with stock price or weather 
information. The user can therefore act as a video editor. 

The video library 170 may also store VOPs which are 
received via the channel 145, and may access VOPs and 
other image elements via a network such as the Intemet. 
Generally, a video session comprises a single VOP, or a 
sequence of VOPs. 

The video object coding and decoding process of FIG. 1 
enables many entertainment, business and educational 
applications, including personal computer games, virtual 
environments, graphical user interfaces, videoconferencing, 
Internet applications and the like. In particular, the capability 
for ME/MC with interlaced coded (e.g., field mode) VOPs in 
accordance with the present invention provides even greater 
capabilities. 

FIG. 2 is a block diagram of an encoder in accordance 
with the present invention. The encoder is suitable for use 
with both predictive-coded VOPs (P-VOPs) and 
bi-directionally coded VOPs (B-VOPs). 

P-VOPs may include a number of macroblocks (MBs) 
which may be coded individually using an intra-frame mode 
or an inter- frame mode. With intra-frame (INTRA) coding, 
the macroblock (MB) is coded without reference to another 
MB. With inter-framc (INTER) coding, the MB is differen- 
tially coded with respect to a temporally subsequent frame 
in a mode known as forward prediction. The temporally 
subsequent frame is known as an anchor frame or reference 
frame. The anchor frame (e.g., VOP) must be a P-VOP or an 
I-VOP, not a B-VOR An I-VOP includes self-contained 
(e.g., intra-coded) blocks which are not predictive coded. 

With forward prediction, the current MB is compared to 
a search area of MBs in the anchor frame to determine the 
best match. A corresponding motion vector (MV), known as 
a backward MV, describes the displacement of the current 
MB relative to the best match MB. Additionally, an 
advanced prediction mode for P-VOPs may be used, where 
motion compensation is performed on 8x8 blocks rather 
than 16x16 MBs. Moreover, both intra-frame and inter- 
frame coded P-VOP MBs can be coded in a frame mode or 
a field mode, 

B-VOPs can use the forward prediction mode as 
described above in connection with P-VOPs as well as 
backward prediction, bi-directional prediction, and direct 
mode, which are all inter- frame techniques. B-VOPs do not 
currently use intra-frame coded MBs under MPEG-4 VM 
8.0, ahhough this is subject to change. The anchor frame 
(e.g., VOP) must be a P-VOP or I-VOP, not a B-VOR 

With backward prediction of B-VOPs, the current MB is 
compared to a search area of MBs in a temporally previous 
anchor frame to determine the best match. A corresponding 
MV, known as a forward MV), describes the relative dis- 
placement of the current MB relative to the best match MB. 
With bi-directional prediction of a B-VOP MB, the current 
MB is compared to a search area of MBs in both a tempo- 
rally previous anchor frame and a temporally subsequent 
anchor frame to determine the best match MBs. Forward and 
backward MVs describe the displacement of the current MB 
relative to the best match MBs. Additionally, an averaged 
image is obtained from the best match MBs for use in 
encoding the current MB. 

With direct mode prediction of B-VOPs, a M V is derived 
for an 8x8 block when the collocated MB in the following 
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P-VOP uses the 8x8 advanced prediction mode. The MV of 
the 8x8 block in the P-VOP is linearly scaled to derive a MV 
for the block in the B-VOP without the need for searching 
to find a best match block. 
5 The encoder, shown generally at 200, includes a shape 
coder 210, a motion estimation function 220, a motion 
compensation function 230, and a texture coder 240, which 
each receive video pixel data input at terminal 205. The 
motion estimation function 220, motion compensation func- 
tion 230, texture coder 240, and shape coder 210 also receive 
VOP shape information input at terminal 207, such as the 
MPEG-4 parameter VOP_of_arbitrary_shape. When this 
parameter is zero, the VOP has a rectangular shape, and the 
shape coder 210 therefore is not used, 

A reconstructed anchor VOP function 250 provides a 
reconstructed anchor VOP for use by the motion estimation 
function 220 and motion compensation function 230. A 
current VOP is subtracted from a motion compensated 
previous VOP at subtracter 260 to provide a residue which 
is encoded at the texture coder 240. The texture coder 240 

20 

performs the DCT to provide texture information (e.g., 
transform coeflSdents) to a multiplexer (MUX) 280. The 
texture coder 240 also provides information which is 
summed with the output from the motion compensator 230 
at a summer 270 for input to the previous reconstructed VOP 
function 250. 

Motion information (e.g., motion vectors) is provided 
from the motion estimation function 220 to the MUX 280, 
while shape information which indicates the shape of the 
VOP is provided from the shape coding function 210 to the 
MUX 280. The MUX 280 provides a corresponding multi- 
plexed data stream to a buffer 290 for subsequent commu- 
nication over a data channel. 
The pixel data which is input to the encoder may have a 

35 YUV 4:2:0 format. The VOP is represented by means of a 
bounding rectangle. The top left coordinate of the bounding 
rectangle is rounded to the nearest even number not greater 
than the top left coordinates of the tightest rectangle. 
Accordingly, the top left coordinate of the bounding rect- 

4Q angle in the chrominance component is one-half that of the 
luminance component, 

FIG. 3 illustrates an interpolation scheme for a half -pixel 
search. Motion estimation and motion compensation (ME/ 
MC) generally involve matching a block of a current video 

45 frame (e.g., a current block) with a block in a search area of 
a reference frame (e.g., a predicted block or reference 
block). For predictive (P) coded images, the reference block 
is in a previous frame. For bi-directionally predicted (B) 
coded images, predicted blocks in previous and subsequent 

50 frames may be used. The displacement of the predicted 
block relative to the current block is the motion vector 
(M V), which has horizontal (x) and vertical (y) components. 
Positive values of the MV components indicate that the 
predicted block is to the right of, and below, the current 

55 block. 

A motion compensated difference block is formed by 
subtracting the pixel values of the predicted block from 
those of the current block point by point. Texture coding is 
then performed on the difference block. The coded MV and 

60 the coded texture information of the difference block are 
transmitted to the decoder. The decoder can then reconstruct 
an approximated current block by adding the quantized 
difference block to the predicted block according to the M V 
The block for ME/MC can be a 16x16 frame block 

65 (macroblock), an 8x8 block or a 16x8 field block. 

Accuracy of the MV is set at half-pixel. Interpolation 
must be used on the anchor frame so that p(i+x,j+y) is 
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defined for x or y being half of an integer. Interpolation is vector (MV) for both top (even) and bottom (odd) fields, 

performed as shown in FIG. 3. Integer pixel positions are Then, choose the reference field which has the smallest SAD 

represented by the symbol "+", as shown at A, B, C and D, (e.g., for SAD,^^ and SAD^„^ J from the field half sample 

Half -pixel positions are indicated by circles, as shown at a, search. 

b, c and d. As seen, a- A, b-(A4B)//2 c-(A+C)//2, and 5 ^ * ^ ^ . . . ^ ^ 

dKA+B+C+D)//4, where "/r denotes rounded division, ^^^^^^ prediction mode decision is based on chocs- 

Further details of the interpolation are discussed in MPEG-4 the minimum of: 

VM 8.0 refened to previously as well as commonly assigned (a) SADig(MV^,MV^)5 
U.S. patent appfication Ser. No. 08/897,847 to Eifrig et al., 

filed Jul. 21, 1997, entitled "Motion Estimation and Com- lo 4 

pensation of Video Object Planes for Interlaced Digital (b) ^^^y^DgCMV^i, 129, 

Video", incorporated herein by reference. '^^ 

FIG. 6 illustrates reordering of pixel fines in an adaptive 

frame/field prediction scheme in accordance with the present anH /'r^^ <;An nuv mv u<;An riuv 

invention. In a first aspect of the advanced prediction ^ 

technique, an adaptive technique is used to decide whether y^bottomf 

a current macroblock (MB) of 16x16 pixels should be If term (a) is the minimum, 16x16 prediction is used. If term 

ME/MC coded as is, or divided into four blocks of 8x8 (b) is the minimum, 8x8 motion compensation (advanced 

pixels each, where each 8x8 block is ME/MC coded prediction mode) is used. If term (c) is the minimum, field 

separately, or whether field based motion estimation should 20 based motion estimation is used. The constant "65" is 

be used, where pixel lines of the MB are reordered to group obtained from Nb/4+1. 

the same-field lines in two 16x8 field blocks, and each 16x8 cv.o -^^ «u«^^« .u^^^ \/i\r^ e^^ tu^ 

^ ■ J I ME/MC d d It 0x0 prediction is chosen, there are tour MVs tor the 

" ^ four 8x8 luminance blocks, i.e., one MV for each 8x8 block. 

'^^f '^^"'Iti^^^^ macroblock (MB), is shown generally j^^^ ^^^^ chrominance blocks is then obtained by 

at 600. The MB includes even-numbered bnes 602, 604, , , . r .u r »«r j j- j- .J 

606, 608, 610, 612, 614 and 616. and odd-numbered lines ^ ^1^^' ^^.^^ ^Tl^^ t o o? 

603, 605, 607, 609, 611, 613, 615 and 617. The even and odd ^^^^^^^ ^^^^^ ^^^^^ lummance 

lines are thus interleaved, and form top and bottom (or first ^^^^^ ^^s a half-pixel accuracy, the MV for the chrominance 

and second) fields, respectively. blocks may have a sixteenth pixel value. Table 1, below, 

When the pixel lines in image 600 are permuted to form ^° specifies the conversion of a sixteenth pixel value to a 

same-field luminance blocks, the MB shown generaUy at half-pixel value for chrominance MVs. For example, 0 

650 is formed. Arrows, shown generally at 645, indicate the through 2/16 are rounded to 0, 3/16 through 13/16 are 

reordering of the lines 602-617. For example, the even line rounded to 1/2, and 14/16 and 15/16 are rounded to 2/2-1. 

TABLE 1 



Vi6 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 

pLxel value 

0001111111111122 



pixel value 



602, which is the first line of MB 600, is also the first line 
of MB 650. The even line 604 is reordered as the second line 
in MB 650. Similarly, the even fines 606, 608, 610, 612, 614 
and 616 are reordered as the third through eighth lines, 
respectively, of MB 650. Thus, a 16x8 luminance region 680 
with even-numbered lines is formed. Similarly, the odd- 
numbered fines 603, 605, 607, 609, 611, 613, 615 and 617 
form a 16x8 region 685. 

The decision process for choosing the MC mode for 
P-VOPs is as follows. For frame mode video, first obtain the 
Sum of Absolute Differences (SAD) for a single 16x16 MB, 
e.g., SAD^^M V^,MV ); and for four 8x8 blocks, e.g., SADg 
(w,i,MV^J,SAD8(MV^,MV^2),SAD8(MV,3,MV^3), and 
SADs(MV,„MV^^). If 

4 

Y^SADiiMV^i, MVyi)<SADi6iMV,, MVy) - 129, 
(=1 

choose 8x8 prediction; otherwise, choose 16x16 prediction. 
The constant "129" is obtained from Nb/2+1, where Nb is 
the number of non-transparent pixels in a MB. 

For interlaced video, obtain SAD,^/MV^_,^^,MV^_^^^), 
SAD^,„,„(MV,_ 

bottom* ^^y_bouam)r whcre (MV^_,^^, 

^^y-to^ ^^yjbotta^ are the motion 



With field prediction, there are two MVs for the two 16x8 
blocks. The luminance prediction is generated as follows. 
The even fines of the MB (e.g., fines 602, 604, 606, 608, 610, 
612, 614 and 616) are defined by the top field MV using the 
reference field specified. The MV is specified in frame 
coordinates such that full pixel vertical displacements cor- 
respond to even integral values of the vertical MV 

50 coordinate, and a half -pixel vertical displacement is denoted 
by odd integral values. When a half-pixel vertical offset is 
specified, only pixek from lines within the same reference 
field are combined. 
The M V for the two chrominance blocks is derived from 

55 the (luminance) MV by dividing each component by two, 
then rounding. The horizontal component is rounded by 
mapping aU fractional values into a half-pixel offset. The 
vertical MV component is an integer and the resulting 
chrominance MV vertical component is rounded to an 

60 integer. If the result of dividing by two yields a non-integral 
value, it is rounded to the adjacent odd integer. Note that the 
odd integral values denote vertical interpolation between 
lines of the same field. 
The second aspect of the advanced prediction technique is 

65 overlapped MC for luminance blocks, discussed in greater 
detail in MPEG-4 VM 8.0 and Eifrig et al. appfication 
referred to previously. 
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Specific coding techniques for B-VOPs are now dis- 
cussed. For INTER coded VOPs such as B-VOPs, there arc TABLE 2 
four prediction modes, namely, direct mode, interpolate 
(e.g., averaged or bi-directional) mode, backward mode, and 
forward mode. The latter three modes are non-direct modes. ^ 
Forward only, or backward only prediction are also known 
as "unidirectional" prediction. The predicted blocks of the 
B-VOP are determined differently for each mode. 
Furthermore, blocks of a B-VOP and the anchor block(s) 
may be progressive (e.g., frame) coded or interlaced (e.g., 
field) coded. 

A single B-VOP can have different MBs which are 
predicted with different modes. The term "B-VOP" only 
indicates that bi-directionally predicted blocks may be 
included, but this is not required. In contrast, with P-VOPs 
and I- VOPs, bi-directionally predicted MBs arc not used. 

For non-direct mode B-VOP MBs, MVs are coded dif- 



Prediction function 


PMV type 


Top field, forward 


0 


Bottom field, forward 


1 


Tbp field, backward 


2 


Bottom field, backward 


3 


TABLE 3 


Macroblock mode 


PMV type used 


Frame, forward 


0 


Frame, backward 


2 


Frame, bi-directional 


0,2 


Field, forward 


0,3 


Field, backward 


2.3 


Field, bi-directional 


0,1,2,3 



ferentially. For forward MVs in forward and bi-directional 20 . ^ , . « . . ^ ^ , , , 

modes, and backward MVs in backward and bi-directional , ^or example, Table 3 shows that, for a current field mode 

, ^, , „xA\Tr c J u t j\ f MB with a forward prediction mode (e.g., "Field, forward'), 

modes^ the same-type MV (e.g forward or bacl^ard) of ^^^^ ^^^^^^ ^ bottom field forward ("1'^ 

the MB which mimediately precedes the current MB in the J^^.^^ ^^^^^^ predictors are used. 

same row is used as a predictor. This is the same as the ^^^^ ^ differential coding, the motion vectors 

immediately preceding MB m raster order, and generally, m ^5 ^^^^^^ ^^^^^ ^^^^ ^ subsequent MB, in 

transmission order. However, If the raster order differs from transmission order. The PMVs are reset to zero at the 

the transmission order, the MVs of the immediately preced- beginning of each row of MBs since the MVs of a MB at the 

mg MB in transmission order should be used to avoid the ^^^^ of ^ preceding row are unlikely to be similar to the MVs 

need to store and re-order the MBs and corresponding MVs of a MB at the beginning of a current row. The predictors are 

at the decoder. also not used for direct mode MBs. For skipped MBs, the 

Using the same-type MV, and assuming the transmission PMVs retain the last value, 
order is the same as the raster order, and that the raster order With direct mode coding of B-VOP MBs, no vector 
is from left to right, top to bottom, the forward MV of the differences are transmitted. Instead, the forward and back- 
left-neighboring MB is used as a predictor for the forward 35 ward MVs are directly computed at the decoder from the 
MV of the current MB of the B-VOP. Similarly, the back- MVs of the temporally next P-VOP MB, with correction by 
ward MV of the left-neighboring MB is used as a predictor a single delta MV, which is not predicted. The technique is 
for the backward MV of the current MB of the B-VOP. The efficient since less M V data is transmitted. 
MVs of the current MB are then differentiaUy encoded using Table 4 below summarizes which PMVs are used to code 
the predictors. That is, the difference between the predictor 40 the motion vectors of the current B-VOP MB based on the 
and the MV which is determined for the current MB is previous and current MB types. For B-VOPs, an array of 
transmitted as a motion vector difference to a decoder. At the prediction motion vectors, prnv[] may be provided which are 

decoder, the MV of the current MB is determined by "^^^^^^ r^T^'^ '° ' P™''^^^' T'^u^ 

J jj- *u nikjixr J *u j «r and pmv[3]). The mdexes prnvN are not transmitted, but the 

recovennc and addmg the PMV and the difference MV j j j . • .1 ri • j . j- . 

^ ^ 45 decoder can determme the pmv[] mdex to use accordmg to 

In case the current MB is located on the left edge of the j^e MV coding type and the particular vector being decoded. 

VOP, the predictor for the current MB is set to zero. yi^ter coding a B-VOP MB, some of the PMVs vectors are 

For interlaccd-coded B-VOPs, each of the top and bottom updated to be the same as the motion vectors of the current 

fields have two associated prediction motion vectors, for a MB. The first one, two or four PMVs are updated depending 

total of four MVs. The four prediction MVs represent, in 50 00 the number of MVs associated with the current MB. 
transmission order, the top field forward and bottom field For example, a forward, field predicted MB has two 

forward of the previous anchor MB, and the top field motion vectors, where pmv[0] is the PMV for the top field, 

backward and bottom field backward ofthe next anchor MB. forward, and pmv[l] is the PMV for the bottom field, 

The current MB and the forward MB, and/or the current MB forward. For a backward, field predicted MB, pmv[2] is the 

and the backward MB, may be separated by one or more 55 PMV for the top field backward, and pmv[3] is the PMV for 

intermediate images which are not used for ME/MC coding ^^e bottom field, backward. For a bi-directional, field pre- 

of the current MB. B-VOPs do not contain INTRA coded ^^^^^^^^ MB pmv[0] is the PMV for the top field, for^vard, 

MBs, so each MB in the B-VOP will be ME/MC coded. The ^1^1;^*"^™^ . ""Z^lt^ U.^^ 

c jjLi J xjm n the PMV for the top field backward, and pmvr3] IS the PMV 

forward and backward anchor MBs may be from a P-VOP r.uu« ^11^1.1 jT- r j ui j 

r xrr^n j ^ c c u j j 60 for the bottoffl field backward. For a forward or backward 

or I-VOP, and may be frame or field coded. ^^^^^^^ ^^^^^ ^^^^ 3_^qP ^^^^ ^ ^ 

For interlaced, non-du-ect mode B-VOP MBs, four pos- only pmv[0] is used for forward, and pmv[2] is used for 

sible prediaion motion vectors (PMVs) are shown in Table backward. For an average (e.g., bi-directionally) predicted 

2 below. The first column of Table 2 shows the prediction frame mode B-VOP MB, there are two MVs, namely, 

function, while the second column shows a designator for 55 pmv[0] for the forward MV, and pmv[2] for the backward 

the PMV. These PMVs are used as shown in Table 3 below MV The row designated "pmv[]'s to update" indicates 

for the different MB prediction modes. whether one, two or four MVs are updated. 
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E^rcdiction Motion Vfector tndex pmvf 1 

Current Macroblock type 

Previous Macroblock type Forward, Backward, Average, Forward Backward, Average, 

in transmission order Direct Frame Mode Frame Mode Frame Mode Field Mode Field Mode Field Mode 



pmv[ ]'s to update none 0,1 2,3 0,1,2,3 0,1 2,3 0,1,2,3 

pmv[]'stouse none 0 2 0,2 0,1 2,3 0,1,2,3 



It will be appreciated that Table 4 is merely a shorthand 
notation for implementing the technique of the present 
invention for selecting a prediction MV for a current MB, 
However, the scheme may be expressed in various other 
ways. 

Intra block DC adaptive prediction can use the same 
algorithm as described in MPEG-4 VM 8.0 regardless of 
value of dct_type. Intra block adaptive AC prediction is 
performed as described in MPEG-4 VM 8.0 except when the 
first row of coeflScients is to be copied from the coded block 
above. This operation is allowed only if dct_type has the 
same value for the current MB and the block above. If the 
dct_types differ, then AC prediction can occur only by 
copying the first column from the block to the left. If there 
is no left block, zero is used for the AC predictors. 

FIG. 4 illustrates direct mode coding of the top field of an 
interlaced-coded B-VOP in accordance with the present 
invention. Progressive direct coding mode is used for the 
current macroblock (MB) whenever the MB in a future 
anchor picture which is at the same relative position (e.g., 
co-sited) as the current MB is coded as (1) a 16x16 (frame) 
MB, (2) an intra MB or (3) an 8x8 (advanced prediction) 
MB. 

The direct mode prediction is interlaced whenever the 
co-sited future anchor picture MB is coded as an interlaced 
MB. Direct mode will be used to code the current MB if its 
biased SAD is the minimum of all B-VOP MB predictors. 
Direct mode for an interlaced coded MB forms the predic- 
tion MB separately for the top and bottom fields of the 
current MB. The four field motion vectors (MVs) of a 
bi-directional field motion compensated MB (e.g., top field 
forward, bottom field forward, top field backward, and 
bottom field backward) are calculated directly from the 
respective MVs of the corresponding MB of the future 
anchor picture. 

The technique is eflBcient since the required searching is 
significantly reduced, and the amount of transmitted MV 
data is reduced. Once the MVs and reference field are 
determined, the current MB is considered to be a 
bi-directional field predicted MB. Only one delta motion 
vector (used for both fields) occurs in the bitstream for the 
field predicted MB. 

The prediction for the top field of the current MB is based 
on the top field MV of the MB of the future anchor picture 
(which can be a P-VOP, or an I-VOP with MV-0), and a past 
reference field of a previous anchor picture which is selected 
by the corresponding MVof the top field of the future anchor 
MB. That is, the top field MB of the future anchor picture 
which is correspondmgly positioned (e.g., co-sited) to the 
current MB has a best match MB in either the top or bottom 
field of the past anchor picture. This best match MB is then 
used as the anchor MB for the top field of the current MB. 
An exhaustive search is used to determine the delta motion 
vector MV^ given the co-sited future anchor MV on a MB 
by MB basis. 



Motion vectors for the bottom field of the current MB are 
similarly determined using the MV of the correspondingly 
positioned bottom field of the future anchor MB, which in 
turn references a best match MB in the top or bottom field 
of the past anchor picture. 

Essentially, the top field motion vector is used to construct 

20 an MB predictor which is the average of (a) pixels obtained 
from the top field of the correspondingly positioned future 
anchor MB and (b) pixels from the past anchor field refer- 
enced by the top field M V of the correspondingly positioned 
future anchor MB. Similarly, the bottom field motion vector 

25 is used to construct a MB predictor which is the average of 
(a) pixels obtained from the bottom field of the correspond- 
ingly positioned future anchor MB and (b) pixels from the 
past anchor field referenced by the bottom field MV of the 
correspondingly positioned future anchor MB. 

^° As shown in FIG. 4, the current B-VOP MB 420 includes 
a top field 430 and bottom field 425, the past anchor VOP 
MB 400 includes a top field 410 and bottom field 405, and 
the future anchor VOP MB 440 includes a top field 450 and 

35 bottom field 445. 

The motion vector MV^ is the forward motion vector for 
the top field 450 of the future anchor MB 440 which 
indicates the best match MB in the past anchor MB 400. 
Even though MV^^ is referencing a previous image (e.g., 

40 backward in time), it is a forward MV since the future 
anchor VOP 440 is forward in time relative to the past 
anchor VOP 400. In the example, MV^^^ references the 
bottom field 405 of the past anchor MB 400, although either 
the top 410 or bottom 405 field could be referenced. MV^^^ 

45 is the forward MV of the top field of the current MB, and 
MV^ is the backward MV of the top field of the current 
MB. Pixel data is derived for the bi-directionally predicted 
MB at a decoder by averaging the pixel data in the future and 
past anchor images which are identified by MVj,^^ and 

50 MV^f^^, respectively, and Gumming the averaged image 
with a residue which was transmitted. 

The motion vectors for the top field are calculated as 
follows: 

and 

60 

MV^ is a delta, or offset, motion vector. Note that the motion 
vectors are two-dimensional Additionally, the motion vec- 
65 tors are integral half-pixel luma motion vectors. The slash 
"/" denotes tmncate toward zero integer division. Also, the 
future anchor VOP is always a P-VOP for field direct mode. 
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If the future anchor was an I-VOP, the MV would be zero 
and 16x16 progressive direct mode would be used. TR^ 
is the temporal distance in fields between the past reference 
field (e.g., top or bottom), which is the bottom field 405 in 
this example, and the top field 430 of the current B-VOP 
420. TR^^^p^ is the temporal distance between the past 
reference field (e.g., top or bottom), which is the bottom field 
405 in this example, and the future top reference field 450. 

FIG. 5 illustrates direct mode coding of the bottom field 
of an interlaced-coded B-VOP in accordance with the 
present invention. Note that the source interlaced video can 
have a top field first or bottom field first format. A bottom 
field first format is shown in FIGS. 4 and 5. Like-numbered 
elements are the same as in FIG. 4, Here, the motion vector 
MV^ is the forward motion vector for the bottom field 445 
of the future anchor macroblock (MB) 440 which indicates 
the best match MB in the past anchor MB 400. In the 
example, M V^, references the bottom field 405 of the past 
anchor MB 400, although either the top 410 or bottom 405 
field could be used. MV^^^ and MV^^^, are the forward and 
backward motion vectors, respectively. 

The motion vectors for the bottom field are calculated in 
a parallel manner to the top field motion vectors, as follows: 



14 



TABLES 



Referenced Field 



Temporal correction. 6 
Bottom Field 



Future 


Future 




First 


Top Field First 


Anchor = 


Anchor = 


Tbp 


Bottom 


Tbp Bottom 


top 


bottom 


Field 6 


Field 6 


Field 6 Field 6 


top 


top 


0 


-1 


0 1 


top 


bottom 


0 


0 


0 0 


bottom 


top 


1 


-1 


-1 1 


bottom 


bottom 


1 


0 


-1 0 



20 



and 



For efficient coding, an appropriate coding mode decision 
process is required. As indicated, for B-VOPs, a MB can be 
coded using (1) direct coding, (2) 16x16 motion compen- 
sated (includes forward, backward and averaged modes), or 
(3) field motion compensation (includes forward, backward 
and averaged modes). Frame or field direa coding of a 
ciirrent MB is used when the corresponding future anchor 
MB is frame or field direct coded, respectively. 

For a field motion compensated MB in a B-VOPs, a 
decision is made to code the MB in a forward, backward, or 
averaged mode based on the minimum luminance half -pixel 
SADs with respect to the decoded anchor pictures. 
30 Specifically, seven biased SAD terms are calculated as 
follows: 



25 



35 



(1) SAD,,„,,+b„ (2) SAD^_^-Hb„ (3) SAD^,,^„^+b„ 
(4) SAD,,,,„^,+b3, (5) SAD^,,^,,^,j,,^+b3, (6) 
^^back^ardjieid^K and (7) SAD^^^^^.^^^b^, 

where the subscripts indicate direct mode, forward motion 
prediction, backward motion prediction, average (i.e., inter- 
polated or bi-directional) motion prediction, frame mode 
(i.e., locally progressive) and field mode (i.e., locally 



TRfi,bof is the temporal distance between the past reference 
field (e.g., top or bottom), which is the bottom field 405 in 
this example, and the bottom field 425 of the current B-VOP 
420. TR^,^ is the temporal distance between the past 

reference field (c.g., top or bottom), which is the bottom field 40 interlaced)/The field SADs above (i.e., SAD^,^,^^^, 
405 in this example, and the future bottom reference field 
445. 

Regarding the examples of FIGS. 4 and 5, the calculation 
of TR^ TR^,,^^, TR5,t^, and TR^,;,^ depends not only on 
the current field, reference field, and frame temporal 
references, but also on whether the current video is top field 
first or bottom field first. In particular. 



^^bacfo^^mjieid^ and SAD,^^^,^J are the sums of the 
top and bottom field SADs, each with its own reference field 
and motion vector. Specifically, 



45 



SAD^ 



^ forward. fteUl = ^^^forvxinl.iop peld + ^^^fonva id, bottom field \ 
SADt^ckward.fifUS = SADbacknard.k^ field + •S'ADfcodtHur^f.bOTr^ field', 
SADavtFage, field — ^^avemge,top field ^^^avemte, bottom fieUt- 



t- SAD ft 



50 



and 



where TR^^^, TRc«r7^«r and TR^^, are the frame number 
of the fiiture, current and past firames, respectively, in 
display order, and 6, an additive correction to the temporal 
distance between fields, is given by Table 5, below. 8 has 
units of field periods. 

For example, the designation "1" in the last row of the first 
column indicates that the future anchor field is the top field, 
and the referenced field is the bottom field. This is shown in 
FIG. 4. The designation "1" in the last row of the second 
column indicates that the future anchor field is the bottom 
field, and the referenced field is also the bottom field. This 
is shown in FIG. 5. 



SAD^,>^^, is the best direct mode prediction, SAD^^^„^ is 
the best 16x16 prediction from the forward (past) reference, 
SADj^^j^^ is the best 16x16 prediction from the backward 
55 (future) reference, SAD^^^^^ is the best 16x16 prediction 
formed by a pixel-by-pixcl average of the best forward and 
best backward reference, SADy^^^^^^^;^ is the best field 
prediction from the forward (past) reference, SAD^„^^„^ 
field is the best field prediction fi:om the backward (future) 
reference, and SAD^^,^^^^^;^ is the best field prediction 
formed by a pixel-by-pixel average of the best forward and 
best backward reference. 

The b/s are bias values as defined in Table 6, below, to 
55 account for prediction modes which require more motion 
vectors. Direct mode and modes with fewer MVs are 
favored. 
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Number of 
motion 



Mode 


vectors 


bi 


Bias 


Value 


Direct 


1 


bi 


-(!Sfb/2 + 1) 


-129 


Frame, 


1 




0 


0 


forward 










Frame, 


1 


ba 


0 


0 


backward 










Frame, 


2 


b3 


(Nb/4 i- 1) 


65 


average 










Field, 


2 


b. 


(Nb/4 4- 1) 


65 


forward 










Field, 


2 




(Nb/4 + 1) 


65 


backward 










Field, 


4 


b4 


(Nb/2 + 1) 


129 


average 
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40 



The negative bias for direct mode is for consistency with the 
existing MPEG-4 VM for progressive video, and may result 
in relatively more skipped MBs. 

FIG. 7 is a block diagram of a decoder in accordance with 
the present invention. The decoder, shown generally at 700, 
can be used to receive and decode the encoded data signals 
transmitted from the encoder of FIG. 2. The encoded video 
image data and differentially encoded motion vector (MV) 
data are received at terminal 740 and provided to a demul- 
tiplexer (DEMUX) 742. The encoded video image data is 
typically dififerentially encoded in DCT transform coeffi- 
cients as a prediction error signal (e.g., residue). 

A shape decoding function 744 processes the data when 30 
the VOP has an arbitrary shape to recover shape information, 
which is, in turn, provided to a motion compensation func- 
tion 750 and a VOP reconstruction function 752. A texture 
decoding function 746 performs an inverse DCT on trans- 
form coefiScients to recover residue information. For INTRA 35 
coded macroblocks (MBs), pixel information is recovered 
directly and provided to the VOP reconstruction function 
752. 

For INTER coded blocks and MBs, such as those in 
B-VOPs, the pixel information provided from the texture 
decoding function 746 to the reconstructed VOP function 
752 represents a residue between the current MB and a 
reference image. The reference image may be pixel data 
from a single anchor MB which is indicated by a forward or 
backward MV Alternatively, for an interpolated (e.g.,. 
averaged) MB, the reference image is an average of pixel 
data from two reference MBs, e.g., one past anchor MB and 
one future anchor MB. In this case, the decoder must 
calculate the averaged pixel data according to the forward 
and backward MVs before recovering the current MB pixel 
data. 

For INTER coded blocks and MBs, a motion decoding 
function 748 processes the encoded M V data to recover the 
differential MVs and provide them to the motion compen- 
sation function 750 and to a motion vector memory 749, 
such as a RAM. The motion compensation function 750 
receives the differential MV data and determines a reference 
motion vector (e.g., predictor motion vector, or PMV) in 
accordance with the present invention. The PMV is deter- 
mined according to the coding mode (e.g., forward, 
backward, bi-directional, or direct). 

Once the motion compensation fiinction 750 determines a 
full reference M V and sums it with the differential MV of the 
current MB, the full MV of the current MB is available. 
Accordingly, the motion compensation function 750 can 
now retrieve anchor frame best match data from a VOP 
memory 754, such as a RAM, calculate an averaged image 



55 



60 



65 



if required, and provide the anchor frame pixel data to the 
VOP reconstruction function to reconstruct the current MB. 

The retrieved or calculated best match data is added back 
to the pixel residue at the VOP reconstruction function 752 
to obtain the decoded current MB or block. The recon- 
structed block is output as a video output signal and also 
provided to the VOP memory 754 to provide new anchor 
frame data. Note that an appropriate video data buffering 
capability may be required depending on the frame trans- 
mission and presentation orders since an anchor frame for a 
B-VOP MB may be a temporally future frame or field, in 
presentation order. 

FIG. 8 illustrates a MB packet structure in accordance 
with the present invention. The structure is suitable for 
B-VOPs, and indicates the format of data received by the 
decoder. Note that the packets are shown in four rows for 
convenience only. The packets are actually transmitted 
serially, starting from the top row, and from left to right 
within a row. The first row 810 includes fields first_shape_ 
code, MVD_shape, CR, ST and BAG. A second row 830 
includes fields MODB and MBTYPE. A third row 850 
includes fields CBPB, DQUANT, Interlaced_information, 
MVD^ MVDfc, and MVDB. A fourth row includes fields 
CODA, CBPBA, Alpha Block Data and Block Data. Each of 
the above fields is defined according to MPEG-4 VM 8,0. 

first_shape_code indicates whether a MB is in a bound- 
ing box of a VOP. CR indicates a conversion ratio for Binary 
Alpha Blocks. ST indicates a horizontal or vertical scan 
order. BAG refers to a binary arithmetic codeword. 

MODB, which indicates the mode of a MB, is present for 
every coded (non-skipped) MB in a B-VOP. Difference 
motion vectors (MVD^ MVD^^ or MVDB) and CBPB are 
present if indicated by MODB. Macroblock type is indicated 
by MBTYPE, which also signals motion vector modes 
(MVDs) and quantization (DQUANT). With interlaced 
mode, there can be up to four MVs per MB. MBTYPE 
indicates the coding type, e.g., forward, backward, 
bi-directional or direct. CBPB is the Coded Block Pattern for 
a B-type macroblock. CBPBA is similarly defined as CBPB 
except that it has a maximum of four bits, DQUANT defines 
changes in the value of a quantizer. 

The field Interlaced_information in the third row 850 
indicates whether a MB is interlaced coded, and provides 
field MV reference data which informs the decoder of the 
coding mode of the current MB or block. The decoder uses 
this information in calculating the MV for a current MB. The 
Interlaced_information field may be stored for subsequent 
use as required in the MV memory 749 or other memory in 
the decoder. 

The Interlaced__information field may also include a flag 
dct_typ6 which indicates whether top and bottom field pixel 
Hnes in a field coded MB are reordered from the interleaved 
order, as discussed above in connection with FIG. 6, 

The MB layer structure shown is used when VOP_ 
prediction_type«=-10. If COD indicates skipped (COD=»« 
"1") for a MB in the most recently decoded I- or P-VOP then 
the CO -located (e.g., co-sited) MB in the B-VOP is also 
skipped. That is, no information is included in the bitstream. 

MVDy is the motion vector of a MB in B-VOP with 
respect to a temporally previous reference VOP (an I- or a 
P-VOP). It consists of a variable length codeword for the 
horizontal component followed by a variable length code- 
word for the vertical component. For an interlaced B-VOP 
MB with field_prediction of "1" and MBTYPE of forward 
or interpolate, MVD^ represents a pair of field motion 
vectors (top field followed by bottom field) which reference 
the past anchor VOP. 
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MVD/, is the motion vector of a MB in B-VOP with 
respect to temporally following reference VOP (an I- or a 
P-VOP). It consists of a variable length codeword for the 
horizontal component followed by a variable length code- 
word for the vertical component. For an interlaced B-VOP 
MB with field_prediction of "1" and MBTYPE of backward 
or interpolate, MVD^ represents a pair of field MVs (top 
field followed by bottom field) which reference the future 
anchor VOR 

MVDB is only present in B-VOPs if direct mode is 
indicated by MODB and MBTYPE, and consists of a 
variable length codeword for the horizontal component 
followed by a variable length codeword for the vertical 
coniponent of each vector. MVDBs represents delta vectors 
that are used to correct B-VOP MB motion vectors which 
are obtained by scaling P-VOP MB motion vectors. 

CODA refers to gray scale shape coding. 

The arrangement shown in FIG. 8 is an example only and 
that various other arrangements for communicating the 
relevant information to the decoder will become apparent to 
those skilled in the art. 

A bitstream syntax and MB layer syntax for use in 
accordance with the present invention is described in 
MPEG-4 VM 8.0 as well as the Eifrig et al. application 
referred to previously. 

Accordingly, it can be seen that the present invention 
provides a scheme for encoding a current MB in a B-VOP, 
in particular, when the current MB is field coded, and/or an 
anchor MB is field coded. A scheme for direct coding for a 
field coded MB is presented, in addition to a coding decision 
process which uses the minimum of sum of absolute differ- 
ences terms to select an optimum mode. A prediction motion 
vector (PMV) is also provided for the top and bottom field 
of a field coded current MB, including forward and back- 
ward PMVs as required, as well as for frame coded MBs. 

Although the invention has been described in connection 
with various specific embodiments, those skilled in the art 
will appreciate that numerous adaptations and modifications 
may be made thereto without departing from the spirit and 
scope of the invention as set forth in the clauns. 

What is claimed is: 

1. A method for calculating direct mode motion vectors 
for a current bi-directionally predicted, field coded image 
having lop and bottom fields, in a sequence of digital video 
images, comprising the steps of: 

determining a past field coded reference image having top 
and bottom fields, and a future field coded reference 
image having top and bottom fields; 

wherein the future image is predicted using the past image 
such that MV,^^, a forward motion vector of the top 
field of the future image, references one of the top and 
bottom fields of the past image, and MV^^ a forward 
motion vector of the bottom field of the ftiture image, 
references one of the top and bottom fields of said past 
image; and 

determining forward and backward motion vectors for 
predicting at least one of the top and bottom fields of 
the current image by scaling the forward motion vector 
of the corresponding field of the future image. 

2. The method of claim 1, wherein: 
MVy,^^, the forward motion vector for predicting the top 

field of the current image is determined according to 
the expression (MV^*TR^^^^)/rR^^,,^+MVj5; 
where TRj, corresponds to a temporal spacing between 65 
the top field of the current image and the field of the 
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corresponds to a temporal spacing between the top field 
of the future image and the field of the past image 
which is referenced by MV,^^, and MV^, is a delta 
motion vector. 

3. The method of claim 2, wherein: 

MV^^^ is determined using integer division with trunca- 
tion toward zero; and 

MV^ and MVj^ are integer half-luma pel motion vec- 
tors. 

4. The method of claim 2, wherein: 

^B,rop ^D,tojp incorporate a temporal correction 
which accounts for whether said current field coded 
image is top field first or bottom field first. 

5. The method of claim 1, wherein: 

MVy^^^ the forward motion vector for predicting the 
bottom field of the current image is determined accord- 
ing to the expression (MV^^^TR^ ^^J/TR^^^^+MV^; 

where TR^ corresponds to a temporal spacing between 
the bottom field of the current image and the field of the 
past image which is referenced by MV^^ TRa**©/ 
corresponds to a temporal spacing between the bottom 
field of the future image and the field of the past image 
which is referenced by MV^^^ and MV^, is a delta 
motion vector. 

6. The method of claim 5, wherein: 

MVy^^^, is determined using integer division with trunca- 
tion toward zero; and 

MV^^ and MV^ are integer half-luma pel motion vec- 
tors. 

7. The method of claim 5, wherein: 

TRjf,^ and TR^^ ^or incorporate a temporal correction 
which accounts for whether said current field coded 
image is top field first or bottom field first. 

8. The method of claim 1, wherein: 

MV^^ , the backward motion vector for predicting the 
top field of the current image is determined according 
to one of the equations (a) MV^^„^«((TR^ -TR^ 
*MV^)/TR^^,^ and (b) MV,^,^=>MV/,^^-MV,,^; 

where TR^ ^ corresponds to a temporal spacing between 
the top fidd of the current image and the field of the 
past image which is referenced by MV , TR^^^ 
corresponds to a temporal spacing between the top field 
of the future image and the field of the past image 
which is referenced by MV^^^, and MVy^^^ is the- 
forward motion vector for predicting the top field of the 
current image. 

9. The method of claim 8, wherein: 

said equation (a) is selected when a delta motion vector 
MV^«=0, and said equation (b) is selected when 

10. The method of claim 1, wherein: 

MV^^, the backward motion veaor for predicting the 
bottom field of the current image is determined accord- 
ing to one of the equations (a) MV^^,=((TR^^- 



TRo.6oJ*MV,^)/TR^_^, and (b) 
MV,, 



past image which is referenced by MV , TR 



where TRjy^, corresponds to a temporal spacing between 
the bottom field of the current image and the field of the 
past image which is referenced by MV^^ T^n.bot 
corresponds to a temporal spacing between the bottom 
field of the future image and the field of the past image 
which is referenced by MV^^,, and MVy.^^, is the 
forward motion vector for predicting the bottom field of 
the current image. 

11, The method of claim 10, wherein: 

said equation (a) is selected when a delta motion vector 
MV^=0, and said equation (b) is selected when 
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12. A method for selecting a coding mode for a current 
predicted, field coded macroblock having top and bottom 
fields, in a sequence of digital video images, comprising the 
steps of: 

determining a forward sum of absolute differences error, 
SAD^^^^^,^ for the current macroblock relative to a 
past reference macroblock, which corresponds to a 
forward coding mode; 

determining a backward simi of absolute differences error. 



SAD, 



backv^ardjield 



for the current macroblock relative to 



a future reference macroblock, which corresponds to a 
backward coding mode; 
determining an average sum of absolute differences error, 
SAD^^^^^^^;j for the current macroblock relative to 
an average of said past and future reference 
macroblocks, which corresponds to an average coding 
mode; and 

selecting said coding mode according to the minimum of 
said SADs. 

13. The method of claim 12, comprising the further step 
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of: 

selecting said coding mode according to the minimum of 
respective sums of said SADs with corresponding bias 
terms which account for the number of required motion 
vectors of the respective coding modes. 

14. The method of claim 12, wherein: 

forward jieid IS determined according to a sum of: (a) a 
sum of absolute differences for the top field of the 
current macroblock relative to a top field of the past 
reference macroblock, and (b) a sum of absolute dif- 
ferences for the bottom field of the current macroblock 
relative to a bottom field of the past reference macrob- 
lock. 

15. The method of claim 12, wherein: 
SADi,^^jt^^^j^^,^ is determined according to a sum of: (a) 

a sum of absolute differences for the top field of the 
current macroblock relative to a lop field of the future 
reference macroblock, and (b) a sum of absolute dif- 
ferences for the bottom field of the current macroblock 
relative to a bottom field of the future reference mac- 
roblock. 

16. The method of claim 12, wherein: 
SAD^^^^^ j5^;^ is determined according to a sum of: (a) a 

sum of absolute differences for the top field of the 
current macroblock relative to an average of the top 
fields of the past and future reference macroblocks, and 
(b) a sum of absolute differences for the bottom field of 
the current macroblock relative to an average of the 
bottom fields of the past and futiu-e reference macrob- 
locks. 

17. A decoder for recovering a current, direct mode, field 
coded macroblock having top and bottom fields in a 
sequence of digital video macroblocks from a received 
bitstream, wherein said current macroblock is 
bi-directionally predicted using a past field coded reference 
macroblock having top and bottom fields, and a future field 
coded reference macroblock having top and bottom fields, 
comprising: 

means for recovering MV^^^, a forward motion vector of 
the top field of the future macroblock which references 
one of the top and bottom fields of the past macroblock, 
and MV^^ a forward motion vector of the bottom field 65 
of the future macroblock which references one of the 
top and bottom fields of said past macroblock; and 
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means for determining forward and backward motion 
vectors for predicting at least one of the top and bottom 
fields of the current macroblock by scaling the forward 
motion vector of the conesponding field of the future 
macroblock. 

18. The decoder of claim 17, further comprising: 
means for determining MV^ , the forward motion vector 

for predicting the top field of the current macroblock, 
according to the expression (MV,^^*TR^ f<,p)/TR/>^^^+ 
MV^; 

where TR^ ^ corresponds to a temporal spacing between 
the top field of the current macroblock and the field of 
the past macroblock which is referenced by MV^^, 
TR^ , corresponds to a temporal spacing between the 
top 6eld of the future macroblock and the field of the 
past macroblock which is referenced by MV,^^, and 
MV^ is a delta motion vector. 

19. The decoder of claim 18, wherein: 

MVy.^^ is determined using integer division with trunca- 
tion toward zero; and 

MV^ and MV^ are integer half-luma pel motion vec- 
tors. 

20. The decoder of claim 18, wherein: 

TRq^^p and TR^^^^^ incorporate a temporal correction 
which accounts for whether said current field coded 
image is top field first or bottom field first. 

21. The decoder of claim 17, further comprising; 
means for determining MV^^^^ the forward motion vector 

for predicting the bottom field of the current 
macroblock, according to the expression (MV^*TR^ 
i-0/rR^,i«.+MV^; 
where TR^ corresponds to a temporal spacing between 
the bottom field of the current macroblock and the field 
of the past macroblock which is referenced by MV^^ 
TKjD tot corresponds to a temporal spacing between the 
bottom field of the future macroblock and the field of 
the past macroblock which is referenced by MV^,^ and 
MV^ is a delta motion vector. 

22. The decoder of claim 21, wherein: 

MVy.^^^ is determined using integer division with trunca- 
tion toward zero; and 

MV^^ and MV^^ are integer haff-luma pel motion vec- 
tors. 

23. The decoder of claim 21, wherein: 

"^B,b€>t ^^D,bat incorporate a temporal correction 
which accounts for whether said current field coded 
image is top field first or bottom field first. 

24. The decoder of claim 17, further comprising: 
means for determining MV^ the backward motion 

vector for predicting the top field of the current 
macroblock, according to one of the equations (a) 



MV,,^)/rR^^,^^ and 



top'* 



where TR^ ^ corresponds to a temporal spacing between 
the top field of the current macroblock and the field of 
the past macroblock which is referenced by MV^^^, 
TR^^ corresponds to a temporal spacing between the 
top fiefd of the future macroblock and the field of the 
past macroblock which is referenced by MV^^^, and 
^^f^op is the forward motion vector for predicting the 
top field of the current macroblock. 

25. The decoder of claim 24, further comprising: 

means for selecting said equation (a) when a delta motion 
vector MV^«0; and 

means for selecting said equation (b) when MV^?^. 
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26. The decoder of claim 17, further comprising: 
means for determining MV^,^^, the backward motion 
vector for predicting the bottom field of the current 
macroblock, according to one of the equations (a) 
MV,^^=((TR^.^^-TR^.^.^*MV^J/^R^,^ and (b) 5 

where TR^ corresponds to a temporal spacing between 
the bottom field of the current macroblock and the field 
of the past macroblock which is referenced by MYf^^ 
T^i>,bot corresponds to a temporal spacing between the 
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bottom field of the future macroblock and the field of 
the past macroblock which is referenced by MV^ and 
MWf^f is the forward motion vector for predicting the 
bottom field of the current macroblock. 
27. The decoder of claim 26, further comprising: 
means for selecting said equation (a) when a delta motion 

vector MV^=0; and 
means for selecting said equation (b) when MV^^O. 

4c * * * * 
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